Second-generation PLINK: rising to the challenge of larger and richer datasets

نویسندگان

  • Christopher C Chang
  • Carson C Chow
  • Laurent CAM Tellier
  • Shashaank Vattikuti
  • Shaun M Purcell
  • James J Lee
چکیده

BACKGROUND PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. FINDINGS To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). CONCLUSIONS The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Chinese Healthcare Challenge; Comment on “Shanghai Rising: Avoidable Mortality as Measured by Avoidable Mortality since 2000”

Investments in the extension of health insurance coverage, the strengthening of public health services, as well as primary care and better hospitals, highlights the emerging role of healthcare as part of China’s new growth regime, based on an expansion of services, and redistributive policies. Such investments, apart from their central role in terms of relief for low-income people, serve to reb...

متن کامل

The role of creative economics in entrepreneurship and revenue generation of public libraries: A systematic review

Purpose: The present study was conducted to identify the status of research on entrepreneurship and income generation in public libraries with a focus on creative economics. Method: The present study is a systematic study. The statistical population of this study was all the researches done in the field of creative economy in public libraries that have been published in connection with entrepr...

متن کامل

Automatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network

Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...

متن کامل

مطالعه سیر تحول فرش‌های جنگ در افغانستان

 One of the outcomes of the long wars in Afghanistan is the emergence of "war rugs". In this paper, the evolution of the war rugs in Afghanistan has been studied using a descriptive-analytical method and the data has been collected using field study and library-based research. The results of the research indicated that from the beginning to the present time, the war rugs in Afgha...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2015